STATISTICAL SOFTWARE R IN CORPUS-DRIVEN RESEARCH AND MACHINE LEARNING
نویسندگان
چکیده
The rapid development of computer software and network technologies has facilitated the intensive application specialized statistical not only in traditional information technology spheres (i.e., statistics, engineering, artificial intelligence) but also linguistics. R is one most popular analytical tools for processing a huge array digitalized language data, especially quantitative corpus linguistic studies Western Europe North America. This article discusses functionality package R, focusing on its advantages performing complex analyses data corpus-driven creating classifiers machine learning. With this mind, three-stage strategy computer-statistical analysis elaborated: 1) preparing to be subjected procedure, 2) utilizing hypothesis testing methods (MANOVA, ANOVA) Tukey post-hoc test, 3) developing model classifier analyzing effectiveness. implemented 11 000 tokens English detached nonfinite constructions with an explicit subject extracted from BNC-BYU corpus. indicates significant differences realization factors parameter “Part speech subject”. analyzed are employed build classification given constructions. Particular attention devoted methodological perspectives interdisciplinary research fields linguistics studies. potential elaborated case study training undergraduate, master, postgraduate students Applied Linguistics indicated. provides all codes written script comprehensive descriptions explanations. concluding part summarizes obtained results highlights issues further connected popularization raising awareness specialists system.
منابع مشابه
Software Effort Prediction using Statistical and Machine Learning Methods
Accurate software effort estimation is an important part of software process. Effort is measured in terms of person months and duration. Both overestimation and underestimation of software effort may lead to risky consequences. Also, software project managers have to make estimates of how much a software development is going to cost. The dominant cost for any software is the cost of calculating...
متن کاملStatistical and Machine Learning
• Course plan: See Table of Contents (tentative). We will emphasize on knowing " why " and on statistical aspects instead of algorithms and programming. But still you have to know " how " by either writing your own implementation or modifying from others' code. • Grading policy: homework 30%, score for late homework= (full points) × 0.8 d , d : delay days. oral presentation 20% on assigned task...
متن کاملA Machine Learning Approach for Statistical Software Testing
Some Statistical Software Testing approaches rely on sampling the feasible paths in the control flow graph of the program; the difficulty comes from the tiny ratio of feasible paths. This paper presents an adaptive sampling mechanism called EXIST for Exploration/eXploitation Inference for Software Testing, able to retrieve distinct feasible paths with high probability. EXIST proceeds by alterna...
متن کاملSemantics-Driven Statistical Machine Translation
Semantic parsing, the task of mapping natural language sentences to logical forms, has recently played an important role in building natural language interfaces and question answering systems. In this talk, I will present three ways in which semantic parsing relates to machine translation: First, semantic parsing can be viewed *as* a translation task with many of the familiar issues, e.g., dive...
متن کاملmlr: Machine Learning in R
The mlr package provides a generic, object-oriented, and extensible framework for classification, regression, survival analysis and clustering for the R language. It provides a unified interface to more than 160 basic learners and includes meta-algorithms and model selection techniques to improve and extend the functionality of basic learners with, e.g., hyperparameter tuning, feature selection...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Technologies and Learning Tools
سال: 2021
ISSN: ['2076-8184']
DOI: https://doi.org/10.33407/itlt.v86i6.4627